Reinforcement Learning in Supervised Problem Domains

نویسنده

  • Thomas Frank Rückstieß
چکیده

Despite continuous advances in computing technology, today’s brute force data processing approaches may not provide the necessary advantage to win the race against the ever-growing amount of data that can be witnessed over the last decades. In this thesis, we discuss novel methods and algorithms that are capable of directing attention to relevant details and analysing it in sequence to overcome the processing bottleneck and to keep up with this data explosion. In the first of three parts, a novel exploration technique for Policy Gradient Reinforcement Learning is presented which replaces traditional additive random exploration with state-dependent exploration, exploring on a higher, more strategic level. We will show how this new exploration method converges faster and finds better global solutions than random exploration can. The second part of this thesis will introduce the concept of “data consumption” and discuss means to minimise it in supervised learning tasks by deriving classification as a sequential decision process and making it accessible to Reinforcement Learning methods. Depending on previously selected features and the internal belief state of a classifier a next feature is chosen by a sequential online feature selection that learns which features are most informative at each given time step. In experiments this attentive hybrid learning system shows significant reduction in required data for correct classification. Finally, the third major contribution of this thesis is a novel sequence learning approach that learns an explicit contextual state while traversing a sequence. This context helps distinguish the current input and mitigates the need for a predictor capable of dealing with sequential data. We show the close relationship to concepts from theoretical computer science, in particular that of deterministic finite automata and regular languages and demonstrate experimentally the capabilities of this hybrid algorithm. All three parts share in common a tight integration of Reinforcement Learning and Supervised Learning which not only delivers an orthogonal view onto this research but also establishes for the first time a general framework of such hybrid algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-Supervised Apprenticeship Learning

In apprenticeship learning we aim to learn a good policy by observing the behavior of an expert or a set of experts. In particular, we consider the case where the expert acts so as to maximize an unknown reward function defined as a linear combination of a set of state features. In this paper, we consider the setting where we observe many sample trajectories (i.e., sequences of states) but only...

متن کامل

Identifying Intention Posts in Discussion Forums

This paper proposes to study the problem of identifying intention posts in online discussion forums. For example, in a discussion forum, a user wrote “I plan to buy a camera,” which indicates a buying intention. This intention can be easily exploited by advertisers. To the best of our knowledge, there is still no reported study of this problem. Our research found that this problem is particular...

متن کامل

ar X iv : 0 80 5 . 20 27 v 1 [ cs . L G ] 1 4 M ay 2 00 8 Rollout Sampling Approximate Policy

Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated by proposals of approximate policy iteration schemes without value functions which focus on policy representation using classifiers and address policy learning as a supervised learning problem. This paper proposes variants of an improved policy iteration scheme which...

متن کامل

Approximate Policy Iteration with Demonstration Data

We propose an algorithm to solve uncertain sequential decision-making problems that utilizes two different types of data sources. The first is the data available in the conventional reinforcement learning setup: an agent interacts with the environment and receives a sequence of state transition samples alongside the corresponding reward signal. The second data source, which differentiates the s...

متن کامل

Structural Abstraction Experiments in Reinforcement Learning

A challenge in applying reinforcement learning to large problems is how to manage the explosive increase in storage and time complexity. This is especially problematic in multi-agent systems, where the state space grows exponentially in the number of agents. Function approximation based on simple supervised learning is unlikely to scale to complex domains on its own, but structural abstraction ...

متن کامل

A Survey of Current Techniques for Reinforcement Learning

This survey considers response generating systems that improve their behaviour using reinforcement learning. The di erence between unsupervised learning, supervised learning, and reinforcement learning is described. Two general problems concerning learning systems are presented; the credit assignment problem and the problem of perceptual aliasing. Notations and some general issues concerning re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016